A flexible shrinkage operator for fussy grouped variable selection
نویسنده
چکیده
Existing grouped variable selection methods rely heavily on prior group information, thus they may not be reliable if an incorrect group assignment is used. In this paper, we propose a family of shrinkage variable selection operators by controlling the k-th largest norm (KAN). The proposed KAN method exhibits some flexible group-wise variable selection naturally even though no correct prior group information is available. We also construct a group KAN shrinkage operator using a composite of KAN constraints. Neither ignoring nor relying completely on prior group information, the group KAN method has the flexibility of controlling within group strength and therefore can reduce the effect caused by incorrect group information. Finally, we investigate an unbiased estimator of the degrees of freedom for (group) KAN estimates in the framework of Stein’s unbiased risk estimation. Extensive simulation studies and real data analysis are performed to demonstrate the advantage of KAN and group KAN over the LASSO and group LASSO, respectively.
منابع مشابه
Differenced-Based Double Shrinking in Partial Linear Models
Partial linear model is very flexible when the relation between the covariates and responses, either parametric and nonparametric. However, estimation of the regression coefficients is challenging since one must also estimate the nonparametric component simultaneously. As a remedy, the differencing approach, to eliminate the nonparametric component and estimate the regression coefficients, can ...
متن کاملJoint Variable Selection and Classification with Immunohistochemical Data
To determine if candidate cancer biomarkers have utility in a clinical setting, validation using immunohistochemical methods is typically done. Most analyses of such data have not incorporated the multivariate nature of the staining profiles. In this article, we consider modelling such data using recently developed ideas from the machine learning community. In particular, we consider the joint ...
متن کاملShrinkage Estimation of Semiparametric Model with Missing Responses for Cluster Data
This paper simultaneously investigates variable selection and imputation estimation of semiparametric partially linear varying-coefficient model in that case where there exist missing responses for cluster data. As is well known, commonly used approach to deal with missing data is complete-case data. Combined the idea of complete-case data with a discussion of shrinkage estimation is made on di...
متن کاملVariable selection for multiply-imputed data with application to dioxin exposure study.
Multiple imputation (MI) is a commonly used technique for handling missing data in large-scale medical and public health studies. However, variable selection on multiply-imputed data remains an important and longstanding statistical problem. If a variable selection method is applied to each imputed dataset separately, it may select different variables for different imputed datasets, which makes...
متن کاملVariable Selection in Nonparametric and Semiparametric Regression Models
This chapter reviews the literature on variable selection in nonparametric and semiparametric regression models via shrinkage. We highlight recent developments on simultaneous variable selection and estimation through the methods of least absolute shrinkage and selection operator (Lasso), smoothly clipped absolute deviation (SCAD) or their variants, but restrict our attention to nonparametric a...
متن کامل